Automatic reference independent evaluation of prosody quality using multiple knowledge fusions
نویسندگان
چکیده
Automatic evaluation of GOR (Goodness Of pRosody) is a more advanced and challenging task in CALL (Computer Aided Language Learning) system. Apart from traditional prosodic features, we develop a method based on multiple knowledge sources without any prior condition of reading text. After speech recognition, apart from most state-of-the-art features in prosodic analysis, we cultivate more concise and effective feature set from the generation of prosody based on Fujisaki model, and influence of tempo in prosody the variability of prosodic components based on PVI method. We also propose methods of boosting training without any annotation by mining larger corpus. Results in experiment investigate the GOR score on 1297 speech samples of excellent group of Chinese students aging from 14-16, we can draw several conclusions: On the one hand, adding the knowledge sources from generation and impact of prosody can contribute to 1.76% reduction in EER and 0.036 promotion in correlation than prosodic features alone; On the other hand, final result can be considerably improved by boosting training approach and topic-dependent scheme.
منابع مشابه
The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملAutomatic Prosody Quality Evaluation of Mandarin Speech
Prosody evaluation is an essential part of computer-aided language learning system. In the paper, we investigate an automatic prosody evaluation method for Mandarin speech. The method is based on prosody comparison between the tested and standard utterance. The prosodic similarities are calculated from three aspects: tone, intonation and rhythm. Based on these similarities, a ranking algorithm ...
متن کاملAn Acoustic Study of Emotivity-Prosody Interface in Persian Speech Using the Tilt Model
This paper aims to explore some acoustic properties (i.e. duration and pitch amplitude of speech) associated with three different emotions: anger, sadness and joy against neutrality as a reference point, all being intentionally expressed by six Persian speakers. The primary purpose of this study is to find out if there is any correspondence between the given emotions and prosody patterning in P...
متن کاملAutomatic Assessment of Non-Native Prosody for English as L2
We recorded non-native English productions of 55 speakers; a subset of these productions was assessed by 60 native English speakers as for their quality w. r. t. intelligibility, rhythm, etc. Applying multiple linear regression on a large prosodic feature vector – modelling approaches known from the literature as well as generic prosody – we can automatically predict the listener’s assessments ...
متن کاملOptionality in evaluating prosody prediction
This paper concerns the evaluation of prosody prediction at the symbolic level, in particular the locations of pitch accents and intonational boundaries. One evaluation method is to ask an expert to annotate text prosodically, and to compare the system’s predictions with this reference. However, this ignores the issue of optionality: there is usually more than one acceptable way to place accent...
متن کامل